Skip to content

Redesign live-to-final assistant replies#3401

Closed
franksong2702 wants to merge 21 commits into
nesquena:masterfrom
franksong2702:franksong2702/live-to-final-assistant-replies
Closed

Redesign live-to-final assistant replies#3401
franksong2702 wants to merge 21 commits into
nesquena:masterfrom
franksong2702:franksong2702/live-to-final-assistant-replies

Conversation

@franksong2702

@franksong2702 franksong2702 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Thinking Path

  • Hermes WebUI's most important interaction surface is the running agent session: users need to understand live progress, tool activity, replay/recovery state, and where the final answer begins.
  • Prior fixes covered individual symptoms such as interim progress, tool cards, compression, replay, stale streams, and session switching, but the product model still needed one coherent live-to-final assistant reply lifecycle.
  • This PR implements the first slice of Redesign live-to-final assistant replies for running agent sessions #3400: visible progress is strengthened as a prompt contract, live-only compression state is shown while useful, settled/final content stops retaining compression status text, and stream ownership/reconnect paths avoid losing the active live reply.
  • During validation, duplicate same-session stream ownership and stale reconnect/replay behavior were blocking the UX from being reliable, so those are included as supporting fixes.

Refs #3400.
Refs #3014 and supersedes #3015.

What Changed

  • Strengthened the WebUI visible-progress prompt contract, absorbing the narrow Restore visible WebUI progress contract #3015 direction into this PR:
    • long tool-running WebUI turns should not appear silent
    • visible progress must be normal assistant content, not only hidden reasoning/tool output
    • models are told not to run many independent tool batches back-to-back without visible assistant text
    • regression coverage rejects the old optional you may provide wording
  • Adjusted Automatic Compression UX:
    • live shows a centered non-interactive divider: Compressing context
    • completion shows Context auto-compressed while the run continues
    • settled/final Activity removes automatic-compression status text
    • the divider typography is muted and non-bold so it reads as lifecycle chrome, not assistant content
  • Hardened live reattach and replay:
    • active run-journal replay honors bounded cursor windows
    • stale cursor-only INFLIGHT state is discarded before reattach
    • explicit reconnect reopens stale CONNECTING EventSource instances
  • Fixed supporting stream ownership cases:
    • chat start rechecks same-session stream ownership under the per-session lock
    • duplicate starts for the same session reuse the current stream instead of creating a hidden ghost stream
  • Added regression coverage for visible progress prompt semantics, compression display, stale stream cleanup, and same-session inflight stream reuse.
  • Updated UI/UX docs, the run-state consistency RFC, DESIGN, and CHANGELOG for the live-only compression semantics.

Why It Matters

Running agent sessions are where users build trust in Hermes WebUI. The UI should make active work legible without confusing internal lifecycle state for final assistant content. This PR moves the experience closer to mature agent clients such as Codex and Claude Code: progress remains visible while work is happening, lifecycle detail is available when useful, and the final answer remains readable and distinct.

Contract Routing

  • Contract family: visible progress prompt contract, streaming/replay/run-state consistency, UI/UX assistant reply lifecycle, Automatic Compression display semantics.
  • Evidence used: docs/rfcs/webui-run-state-consistency-contract.md, docs/UIUX-GUIDE.md, DESIGN.md, focused frontend/static tests, run-journal replay tests, and manual 8788 live-session validation.
  • Contract change: visible interim progress for long tool-running WebUI turns is now firm prompt-contract language rather than optional guidance. Live-only Automatic Compression status is treated as transient running-session UI, not persistent settled transcript content. Final settled Activity keeps the Worklog, but removes automatic-compression status dividers.

Verification

  • node --check static/ui.js static/messages.js static/sessions.js static/workspace.js static/panels.js static/i18n.js
  • git diff --check origin/master
  • python3 scripts/ruff_lint.py --diff origin/master
    • Result: no changed Python files vs origin/master
  • python -m pytest -q tests/test_sprint42.py tests/test_auto_compression_card.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_routes.py tests/test_run_journal_frontend_static.py
    • Result: 108 passed, 1 warning
  • python -m pytest tests/ -q --timeout=60 --shard-id=0 --num-shards=3
    • Result: 2391 passed, 6 skipped, 2 xpassed, 1 warning; one local failure in tests/test_profile_skills_stats.py::test_get_profile_skills_stats from the macOS platform fixture assumption, unrelated to this PR's diff
  • Manual 8788 validation:
    • Spark and MiniMax-M3 were available in the isolated dev runtime
    • live sessions triggered Auto Compression
    • Auto Compression showed Compressing context and Context auto-compressed as centered live dividers
    • automatic-compression dividers did not remain as final answer content
    • tool/lifecycle chrome was visually quieter than assistant prose in dark and light skins

Screenshots

Live running state with prose, muted tool rows, and the centered compression divider:

Live light theme compressing context

Compression completion while the run continues:

Live light theme context auto-compressed

Dark theme live state with prose, quiet tool row, token/timer footer:

Dark live prose tool footer

Expanded quiet tool rows remain visually subordinate to assistant prose:

Dark expanded tool rows

Final settled state keeps the folded L1 Worklog above assistant content:

Dark final L1 worklog

Risks / Follow-ups

  • This PR absorbs the narrow prompt-contract slice from Restore visible WebUI progress contract #3015 because the live-to-final assistant reply design depends on models reliably emitting visible progress prose.
  • This PR intentionally keeps the implementation slice narrower than the whole Redesign live-to-final assistant replies for running agent sessions #3400 design space.
  • Follow-up areas intentionally left out of this PR:
    • queue composer behavior during compression
    • explicit degraded/rebuild status during slow reattach
    • native SSE Last-Event-ID support
    • max tool-call iteration / compression-exhausted terminal taxonomy refinements
    • broader sidebar/session awareness improvements

Model Used

AI-assisted.

  • Provider: OpenAI / Codex
  • Model: GPT-5 Codex for implementation, debugging, merge preparation, and PR drafting
  • Additional validation model: GPT 5.3 Codex Spark was used in the local 8788 runtime to trigger running-session and Auto Compression scenarios

@franksong2702 franksong2702 marked this pull request as draft June 2, 2026 11:24
@franksong2702 franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch from d33dbce to a2bf57a Compare June 2, 2026 11:39
@franksong2702 franksong2702 marked this pull request as ready for review June 2, 2026 11:47
@franksong2702 franksong2702 marked this pull request as draft June 2, 2026 12:17
@franksong2702 franksong2702 marked this pull request as ready for review June 2, 2026 12:49
@franksong2702 franksong2702 marked this pull request as draft June 2, 2026 15:00
@franksong2702 franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch 3 times, most recently from c96e741 to a000b20 Compare June 2, 2026 20:36
@franksong2702 franksong2702 marked this pull request as ready for review June 2, 2026 20:43
@franksong2702 franksong2702 marked this pull request as draft June 3, 2026 10:09
@franksong2702 franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch from b62fd31 to 85830b7 Compare June 3, 2026 10:24
@franksong2702 franksong2702 marked this pull request as ready for review June 3, 2026 10:43
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

This is a large slice (57 files, ~6k lines), so I focused on the one part with real runtime-behavior risk: the rewritten "missing final assistant reply" detection in api/streaming.py. I read the new helpers _session_lacks_final_assistant_answer (api/streaming.py:3497-3527) and _agent_result_terminal_failure (api/streaming.py:3529-3540), the call site (api/streaming.py:5690-5693), and diffed against the old guard on origin/master:api/streaming.py:5608-5609. The compression-exhausted classification (_classify_provider_error, new compression_exhausted branch) and the post-compression tool-result pruning (_prune_context_tool_results_after_compression) both look sound and defensive. One behavior change deserves a second look before merge.

The _token_sent guard was dropped, which can turn a successful tool-terminal turn into an error

On master the guard suppressed the silent-failure error whenever any text was streamed:

# origin/master:5608-5609
# _token_sent tracks whether on_token() was called (any streamed text)
if not _assistant_added and not _token_sent:

The PR replaces that with:

# HEAD:5690-5693
_terminal_failure = _agent_result_terminal_failure(result) or _session_lacks_final_assistant_answer(_all_result_messages)
if _terminal_failure:
    _assistant_added = False
if _terminal_failure or not _assistant_added:

_session_lacks_final_assistant_answer returns True whenever the transcript ends on a tool row (api/streaming.py:3505-3506):

role = msg.get('role')
if role == 'tool':
    return True

Crucially it makes this decision purely from message shape and ignores the result's success status — _agent_result_terminal_failure(result) is OR'd in, so even a result whose status is done is overridden to _assistant_added = False if the last persisted message is a tool result. The downstream branch then classifies with silent_failure=not bool(_err_str); with no error string this yields the "No response from provider" apperror. So a turn that streamed visible text and ended cleanly on a final tool batch with no closing assistant sentence now surfaces an inline error to the user, where master let it complete (because _token_sent was true).

That interaction is sharpened by this very PR: the strengthened progress contract (_WEBUI_PROGRESS_PROMPT, new lines telling the model to "say what you just confirmed and what you will check next before continuing with more tools") makes "emit prose, then run a final tool batch, then stop" a more likely shape, not less. A model that does exactly what the new prompt asks and whose last action is a successful tool call would get flagged.

Suggestion

Gate the shape-based check on the result not already reporting success, and/or restore the streamed-text escape hatch:

_terminal_failure = _agent_result_terminal_failure(result)
if not _terminal_failure and not _token_sent:
    _terminal_failure = _session_lacks_final_assistant_answer(_all_result_messages)

That keeps the genuine target case (agent fails mid-tool-run, nothing streamed, no final answer) firing, while not penalizing a turn that produced visible progress and merely ended on a tool result. If ending-on-tool is intentionally treated as failure even when text streamed, please say so in a comment near api/streaming.py:3505 and add a test_live_stream_ux case pinning "streamed text + tool-terminal + success status" to the chosen outcome — right now _session_lacks_final_assistant_answer's status-blindness isn't covered by an assertion that distinguishes it from _agent_result_terminal_failure.

Everything else I sampled in streaming.py reads cleanly. I didn't execute the suite (cron policy), but the diff's own claim of 108 passed on the focused files plus the message-wording changes (Compressing context / Compression finished) line up with the test_auto_compression_card.py updates. Worth splitting the non-streaming doc/CHANGELOG churn from the behavior change if you want a tighter review surface, but that's process, not correctness.

franksong2702 pushed a commit to franksong2702/hermes-webui-fork that referenced this pull request Jun 4, 2026
The provider/model reasoning-effort coercion (coerce_reasoning_effort_for_model,
_filter_reasoning_efforts_for_provider, and their call-site wrappers) is
unrelated to the live-to-final assistant reply experience and changes behavior
for all reasoning-capable models. Reverting it here keeps PR nesquena#3401 focused on
live stream / worklog / auto-compression / stream-ownership. The StreamChannel
snapshot/event-id changes in config.py are part of the live-stream replay work
and intentionally remain.

The coercion ships separately so it gets a provider-capability-focused review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 4, 2026

Copy link
Copy Markdown

Greptile Summary

This PR implements the first slice of a cohesive live-to-final assistant reply lifecycle for Hermes WebUI (#3400): it hardens the visible-progress prompt contract, redesigns Automatic Compression UX (live divider → "Compressing context" / "Context auto-compressed" / removed from settled transcript), and fixes several stream ownership and replay bugs that were blocking a reliable live session experience.

  • Prompt contract: _WEBUI_PROGRESS_PROMPT now requires visible interim prose between tool batches; regression tests confirm the old optional wording is rejected.
  • Reconnect & replay hardening: StreamChannel gains subscribe_with_snapshot and note_last_event_id; the SSE handler replays the journal up to the subscriber's last_event_id cursor and then skips already-seen events from the offline buffer, resolving the duplicate-delivery issue flagged in earlier reviews.
  • Session-switch lifecycle: _start_chat_stream_for_session re-checks stream ownership inside the session lock; stale CONNECTING EventSource instances are now torn down on explicit reconnect; stale cursor-only INFLIGHT entries are discarded before re-attach; and loadSession propagates activityBurstAnchors, lastAssistantText, and related fields so multi-burst live turns survive session switches.

Confidence Score: 5/5

This PR is safe to merge. The reconnect dedup logic, stream ownership re-check under lock, and INFLIGHT lifecycle changes are all well-tested and the primary code paths are correct.

The three issues flagged in earlier reviews (undefined _completeAutomaticCompressionOnLiveProgress, reconnect dedup no-op, indentation) are all addressed. The new StreamChannel snapshot mechanism correctly pairs note_last_event_id with put_nowait to keep _last_event_id in sync, and the snapshot_cutoff_seq dedup guard prevents offline-buffer duplicates on reconnect. The _start_chat_stream_for_session while-loop re-check is bounded in practice. The remaining findings are style nits with no behavioral impact.

No files require special attention. The commented-out line in sessions.js and the minor _hashString allocation in messages.js are cosmetic.

Important Files Changed

Filename Overview
api/config.py StreamChannel gains subscribe_with_snapshot, note_last_event_id, and 3-tuple put_nowait; _last_event_id is now maintained atomically under _lock; well-structured and tested.
api/routes.py Replay dedup logic (snapshot_cutoff_seq / replay_cutoff_seq) is correct for the common case; _start_chat_stream_for_session loop correctly re-checks ownership under lock.
api/streaming.py Adds 3-tuple event queueing with note_last_event_id, post-compression tool-result pruning, and updated compression message strings; changes are focused and low-risk.
api/run_journal.py Adds max_seq parameter to read_run_events for upper-bound filtering; minimal, correct change.
static/messages.js _completeAutomaticCompressionOnLiveProgress now defined; activityBurstAnchor / segmentSeq tracking added to INFLIGHT; _hashString allocates String(value
static/sessions.js loadSession INFLIGHT deletion for journalReplayFromStart/stale cursor entries is correct; _loadingSessionId guard fix allows same-session reload during pending switches; commented-out line left in the non-INFLIGHT active-stream branch.
static/ui.js Large addition of worklog/live-run-status helpers; ensureLiveWorklogShell, showLiveRunStatus, _moveLiveRunStatusToTurnEnd all well-guarded with typeof checks; _stripVisibleAssistantEchoFromThinking semantics narrowed to exact-match only.
static/style.css Large style rework for live-worklog, compression divider, run-status footer; muted typography for lifecycle chrome vs assistant prose.
tests/test_inflight_stream_reuse.py New static-analysis tests covering same-stream reuse, CONNECTING transport rejection on reconnect, and same-session no-op guard fix.
tests/test_auto_compression_card.py Tests for _completeAutomaticCompressionOnLiveProgress definition and per-event-listener call, and elapsed-timer no-op after PR changes.

Sequence Diagram

sequenceDiagram
    participant FE as Frontend (sessions.js / messages.js)
    participant SC as StreamChannel (config.py)
    participant RJ as RunJournal (run_journal.py)
    participant SSE as SSE Handler (routes.py)
    participant Worker as Streaming Worker (streaming.py)

    Note over FE,Worker: Live turn in progress
    Worker->>SC: note_last_event_id(event_id)
    Worker->>SC: put_nowait((event, data, event_id))
    SC->>SC: "_last_event_id = event_id"
    SC-->>FE: broadcast 3-tuple to active subscribers

    Note over FE,Worker: User switches session — SSE disconnects
    SC->>SC: Buffer items in _offline_buffer

    Note over FE,Worker: User switches back — reconnect
    FE->>SSE: "GET /api/chat/stream?replay=1&after_seq=N&after_event_id=X"
    SSE->>SC: subscribe_with_snapshot()
    SC-->>SSE: "(subscriber_queue, {last_event_id: Y})"
    SSE->>SSE: "snapshot_cutoff_seq = parse(Y)"
    SSE->>RJ: "_replay_run_journal(after_seq=N, max_seq=Y, include_stale=False)"
    RJ-->>SSE: journal events N+1 to Y
    SSE-->>FE: replay events via SSE
    SSE->>SSE: "replay_cutoff_seq = Y"

    loop Live stream tail
        SC-->>SSE: item from subscriber_queue
        SSE->>SSE: "event_seq = parse(item.event_id)"
        alt "event_seq <= replay_cutoff_seq"
            SSE->>SSE: skip duplicate
        else "event_seq > replay_cutoff_seq"
            SSE-->>FE: emit SSE event
        end
    end
Loading

Reviews (17): Last reviewed commit: "Close #3401 merge review test gaps" | Re-trigger Greptile

Comment thread api/routes.py
Comment thread static/sessions.js Outdated
@franksong2702

Copy link
Copy Markdown
Contributor Author

Pushed follow-up commits addressing review feedback. Summary of what changed on this branch:

207c09f9 — Fix false "no response" on streamed tool-terminal turns (addresses @nesquena-hermes's review)
The missing-final-assistant guard OR'd the status-blind _session_lacks_final_assistant_answer check in unconditionally, so a successful turn that streamed visible progress and ended on a final tool batch (or a leaked role=user control message) was reclassified as a terminal failure and surfaced a false error. The shape check is now gated behind not _token_sent, while _agent_result_terminal_failure(result) stays authoritative for explicit failure/partial/compression-exhausted status. Added behavioral unit tests for both helpers (tool-tail, user-tail, empty-messages, error-tail, success) and realigned the static guard to the corrected semantics.

4c48968b — Removed the reasoning-effort coercion from this PR. It's unrelated to the live-to-final reply experience and changes behavior for all reasoning-capable models, so it now ships as its own focused PR (#3505). The StreamChannel snapshot/event-id changes in api/config.py stay here because they're part of the reconnect-replay work.

b69bad5a — Addresses the Greptile review:

  • P1 (reconnect dedup was a no-op): correct catch — the worker keeps the queue 2-tuple (Stage-364) and propagates the journal event id via the STREAM_LAST_EVENT_ID side-channel, so StreamChannel._last_event_id was never populated, snapshot_cutoff_seq/replay_cutoff_seq never engaged, and offline-buffer events could be redelivered after journal replay. Rather than switch to a 3-tuple (which would break the Stage-364 design and its static tests), added StreamChannel.note_last_event_id() and call it from the worker put() and cancel_stream, so the cutoff now actually engages while the 2-tuple queue shape is preserved. Added a unit test for the snapshot wiring plus a static guard so put() can't silently regress to inert.
  • P2 (unbounded while True): the ownership-recheck loop is now capped at 3 attempts and returns 409 instead of spinning if stale cleanup keeps succeeding while another thread re-claims the session.
  • P2 (misleading indentation in loadSession): re-indented the INFLIGHT block; whitespace-only (git diff -w shows no non-whitespace change, node --check passes).

Local verification: tests/test_cancelled_turn_status.py, test_webui_runtime_diagnostics.py, test_stage364_opus_live_sse_event_id.py, test_run_journal_routes.py, test_stale_stream_cleanup.py, test_inflight_stream_reuse.py, test_run_journal_streaming_static.py, test_sprint42.py, test_regressions.py all pass locally; full matrix runs in CI.

Note: this branch will periodically re-conflict on CHANGELOG.md only (the [Unreleased] block) whenever a new release lands on master — no code conflicts.

🤖 Generated with Claude Code

@franksong2702

Copy link
Copy Markdown
Contributor Author

Updated with merge commit 53727cb to bring the branch onto latest master, resolve the CHANGELOG conflict, and remove the duplicate _active_stream_ids import noted in review. Local verification: git diff --check; pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_sprint42.py tests/test_regressions.py -q (179 passed).

@franksong2702

Copy link
Copy Markdown
Contributor Author

Conflict cleanup and CI fallout fixes pushed through 183e8258.

What changed:

  • Merged latest origin/master into franksong2702/live-to-final-assistant-replies and resolved conflicts in the live-to-final scope.
  • Restored merge-lost static UI invariants from current master:
    • partial tool-call assistant rows remain visible/anchorable;
    • thinking-only messages in simplified tool-calling mode render inline instead of inside a collapsed activity group;
    • persistent-state tool_complete notifications see tc.is_error before toast classification;
    • live compression card replacement restores the scroll snapshot before follow-settle.
  • No fix(runtime): make cancelStream() owner-aware and close its SSE source #3345 active Stop changes or unrelated mobile titlebar changes were folded into this PR.

Verification:

  • python -m pytest tests/test_issue401.py tests/test_issue3592_thinking_settlement.py -q -> 16 passed
  • python -m pytest tests/test_issue3340_persistent_state_toasts.py tests/test_issue3479_ios_stream_scroll_jump.py tests/test_issue401.py tests/test_issue3592_thinking_settlement.py -q -> 27 passed
  • python -m pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_sprint42.py tests/test_regressions.py tests/test_webui_gateway_chat_backend.py -q -> 200 passed
  • node --check static/messages.js static/ui.js static/sessions.js static/boot.js
  • git diff --check
  • GitHub Actions: 11/11 passed on 183e8258

AI assistance: Codex coordinated the merge cleanup with a sub-agent, reviewed the returned diff, fixed the CI-reported merge fallout, and reran focused verification before pushing.

@franksong2702

Copy link
Copy Markdown
Contributor Author

Conflict cleanup pushed in 19598a70.

What changed:

  • Merged latest origin/master (4c545a33) into franksong2702/live-to-final-assistant-replies.
  • Resolved conflicts in CHANGELOG.md and static/messages.js.
  • Kept this PR's live-to-final Unreleased notes while preserving current release history.
  • Preserved the current master terminal-event stale-stream bail-out in done while keeping this PR's immediate _streamFinalized behavior, so the prior merge-lost static UI invariants remain intact.

Verification:

  • git diff --check -> passed
  • node --check static/messages.js static/sessions.js static/ui.js static/boot.js -> passed
  • /Users/xuefusong/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_cancel_stream_owner_guard.py tests/test_issue3587_intermediate_reasoning.py tests/test_session_events.py -q -> 132 passed
  • GitHub Actions on 19598a70 -> 11/11 passed
  • GitHub mergeability -> MERGEABLE / CLEAN

AI assistance: Codex performed the conflict cleanup and focused regression review.

franksong2702 pushed a commit to franksong2702/hermes-webui-fork that referenced this pull request Jun 5, 2026
@franksong2702

Copy link
Copy Markdown
Contributor Author

Conflict cleanup pushed in 72b838fd.

What changed:

  • Merged latest origin/master (f1211e1f, v0.51.267) into franksong2702/live-to-final-assistant-replies.
  • Resolved the CHANGELOG.md conflict by keeping this PR's live-to-final notes in Unreleased and preserving the new v0.51.267 security release notes.
  • api/routes.py auto-merged with the v0.51.267 security hardening changes; no manual behavior conflict was needed there.

Verification:

  • git diff --check origin/master..HEAD -> passed
  • node --check static/messages.js static/sessions.js static/ui.js static/boot.js -> passed
  • python -m pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_cancel_stream_owner_guard.py tests/test_issue3587_intermediate_reasoning.py tests/test_session_events.py tests/test_issue1909_csrf_token.py tests/test_issue2931_edge_tts_endpoint.py tests/test_sprint29.py -q -> 216 passed
  • GitHub Actions on 72b838fd -> 11/11 passed
  • GitHub mergeability -> MERGEABLE / CLEAN

AI assistance: Codex Autopilot performed the branch refresh, reviewed final scope, ran focused verification, pushed, and read back GitHub state.

@franksong2702

Copy link
Copy Markdown
Contributor Author

Updated the #3401 Thinking/Worklog blocker.

Product model:

  • Worklog remains the live-to-final record for an assistant turn.
  • Thinking is preserved as its own Worklog Thinking Card, sibling to process prose and Tool Cards.
  • Thinking is not promoted into final answer text and is not treated as a Tool Card.

Implementation:

  • Live Thinking Cards are now segment-scoped, so later reasoning does not keep updating the first Thinking Card in the turn.
  • Settled rendering keeps Thinking Cards in the folded Worklog.
  • Duplicate suppression is intentionally narrow: exact / normalized-exact only against visible process/final text from the same assistant turn.
  • Reasoning metadata is preserved even when the visible Thinking Card is suppressed.

Tradeoff:

  • Partial overlap and semantic dedupe are intentionally not handled in this PR. That avoids live content jumping and avoids model/provider-specific behavior.

Verification:

  • node --check static/messages.js static/ui.js
  • pytest tests/test_regressions.py tests/test_ui_tool_call_cleanup.py tests/test_issue2565_reasoning_accumulation.py tests/test_issue3592_thinking_settlement.py tests/test_issue_progress_echo_dedupe.py tests/test_issue2454_active_session_spinner.py
  • npx --yes eslint@10.4.0 --no-config-lookup -c eslint.runtime-guard.config.mjs "static/**/*.js"

Frank Song added 2 commits June 6, 2026 18:07
…to-final-assistant-replies

# Conflicts:
#	api/streaming.py
#	static/messages.js
#	static/ui.js
#	tests/test_issue765_streaming_persistence.py
#	tests/test_ui_tool_call_cleanup.py
@franksong2702

Copy link
Copy Markdown
Contributor Author

Follow-up after the latest refresh/re-gate:

  • Refreshed Redesign live-to-final assistant replies #3401 onto current origin/master and restored the reviewer-branch guardrails that were easy to lose during conflict resolution: idle attention-dot visibility/color, the dead settleLiveCompressionCards() removal, and stale test expectations around Fix compression-exhausted stream finalization #3316 terminal-failure semantics / Redesign live-to-final assistant replies #3401 Thinking replay.
  • The intended product model is unchanged: process prose, Thinking Card, and Tool Card are sibling items inside the Worklog. Thinking is not silently dropped and is not treated as a Tool Card or Final Answer.
  • Settled duplicate suppression is intentionally narrow: only exact / normalized-exact Thinking echoes are hidden from the folded Worklog. We keep the underlying reasoning metadata, preserve non-exact provider reasoning, and do not attempt partial-overlap, semantic, or model-specific dedupe. That avoids live-stream content jumping while still removing the obvious Spark/openai-codex exact echo in the settled view.
  • Latest head 002a57a6 is green on Browser smoke and the full Tests matrix, including lint and the Scope / undefined-reference gate.

nesquena-hermes added a commit that referenced this pull request Jun 7, 2026
…ds on reconnect, fixes #3707) (#3766)

* fix(streaming): replay restored live tool cards on reconnect (#3763, fixes #3707)

Post-#3401 (#3400 live-to-final epic) recovery residual. When a running session
is restored from its in-memory live-turn snapshot and then reattached to the SSE
stream, the restore-success path skipped replaying persisted live tool calls,
leaving restored live text/thinking but an EMPTY Worklog until a later SSE event
or the final render rebuilt the turn.

- Extract the persisted-tool-card replay into replayPersistedLiveToolCards()
  (reads S.toolCalls or INFLIGHT[sid].toolCalls); run it on restoredLiveTurn &&
  didReconnect, not only the !restoredLiveTurn fallback.
- Dedup safety: restore-success replay passes {skipUnkeyedRestoredDuplicates:true}
  — when the restored snapshot already has .tool-card-row rows, an UNKEYED
  persisted tool is skipped to avoid a duplicate; keyed cards still replay and
  appendLiveToolCard's tid-dedup replaces the correct restored row.
- appendLiveToolCard() and the new liveToolReplayId() both key on
  tid||id||tool_call_id||tool_use_id||call_id (consistent 5-alias set), so the
  dedup covers all known id shapes.
- Both replay sites pass {sessionId, streamId} so the ownership guard applies.
- Regression coverage: restore-success+reconnect replays tools; unkeyed-restored
  duplicates skipped; all-id-alias dedup; prior ordering invariants preserved.

Correct post-#3401 fix for #3707 (supersedes the closed #3724).

Co-authored-by: franksong2702 <[email protected]>

* docs(changelog): stamp v0.51.309 — Release JY (stage-a5b #3763)

---------

Co-authored-by: nesquena-hermes <[email protected]>
nesquena-hermes pushed a commit that referenced this pull request Jun 9, 2026
* docs(rfc): add Transparent Stream activity display mode RFC (#3820)

Proposes Transparent Stream as an opt-in, chronological activity display
mode alongside the default Compact Worklog (#3400/#3401). Captures the
display-mode split agreed in #3820: each tool call as a first-class
chronological event, interleaved with reasoning/progress, with compact
previews, consistent across live, settled, and reload/replay paths.

Documents the asymmetry in the existing `simplified_tool_calling` toggle
(live-only, no settled/reload branch) and the three concrete integration
points so the follow-up can be sliced safely. Doc-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(rfc): refine Transparent Stream rollout scope

---------

Co-authored-by: Frank Song <franksong2702@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
nesquena-hermes added a commit that referenced this pull request Jun 9, 2026
/#3876) (#3886)

Fixes #3869: empty legacy three-dot thinking spinners piled up as stale
rows after the agent finished thinking. The live-to-final redesign (#3401)
made the thinking-card-row wrapper class unconditional, which broke
finalizeThinkingCard()'s dots-only detection — it treated the wrapper class
itself as a "has content" signal, so the dots-only removal branch went dead.
Narrow hasContent to the actual .thinking-card element so dots-only spinners
are removed on finalize while real Worklog Thinking Cards are preserved.

Includes #3869 regression coverage (brace-walks finalizeThinkingCard, asserts
the narrowed check + that real thinking cards are not removed).

Co-authored-by: nesquena-hermes <[email protected]>
Co-authored-by: franksong2702 <franksong2702@users.noreply.github.com>
nesquena-hermes pushed a commit that referenced this pull request Jun 9, 2026
Fixes #3875: chat transcript rendering as only a stack of date separators
with no message bodies. The live-to-final/Worklog redesign (#3401) folds
intermediate assistant segments into a collapsed Worklog and hides the source
segment; when a turn's ONLY content is folded into a collapsed Worklog (empty
final assistant message from an interrupted/autonomous run, or a reload where
S.toolCalls did not hydrate so the Worklog has no expandable steps), every
segment is hidden and the turn paints blank — leaving a bare column of date
dividers.

Adds a defensive fail-safe invariant at the end of renderMessages(): a settled
assistant turn never renders with zero visible content. Blank turns get their
folded Worklog expanded (or hidden segments un-hidden as a last resort). Turns
with any visible answer are untouched, preserving the intended collapsed-Worklog
UX. Reproduced + verified fixed in an isolated browser (clean Chrome profile to
defeat the ?v= asset-cache); RED on master (blank 'Worklog' chip), GREEN with
the fix (Worklog expanded, content visible).

Includes #3875 structural regression coverage.
nesquena-hermes added a commit that referenced this pull request Jun 9, 2026
… + #3887 + #3831) (#3889)

* Release v0.51.342 — Release LF (blank-transcript brick fix #3875)

Fixes #3875: chat transcript rendering as only a stack of date separators
with no message bodies. The live-to-final/Worklog redesign (#3401) folds
intermediate assistant segments into a collapsed Worklog and hides the source
segment; when a turn's ONLY content is folded into a collapsed Worklog (empty
final assistant message from an interrupted/autonomous run, or a reload where
S.toolCalls did not hydrate so the Worklog has no expandable steps), every
segment is hidden and the turn paints blank — leaving a bare column of date
dividers.

Adds a defensive fail-safe invariant at the end of renderMessages(): a settled
assistant turn never renders with zero visible content. Blank turns get their
folded Worklog expanded (or hidden segments un-hidden as a last resort). Turns
with any visible answer are untouched, preserving the intended collapsed-Worklog
UX. Reproduced + verified fixed in an isolated browser (clean Chrome profile to
defeat the ?v= asset-cache); RED on master (blank 'Worklog' chip), GREEN with
the fix (Worklog expanded, content visible).

Includes #3875 structural regression coverage.

* docs(ui): clarify revealed-flag intent in #3875 fail-safe (greptile P2)

Address greptile review on PR #3889: the 'revealed' flag means 'turn has a
visible non-empty Worklog group' not 'we just expanded one'. An already-open
non-empty group is itself visible, so the last-resort un-hide is correctly
skipped. Comment-only; no behavior change.

---------

Co-authored-by: nesquena-hermes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changes-requested Maintainer left detailed feedback requesting changes; PR is waiting on author to address hold

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants